Transcriptome-Based Study on the Phylogeny and Hybridization of Marattialean Ferns (Marattiaceae)

Marattiaceae is a phylogenetically isolated family of tropical eusporangiate ferns including six genera with more than one-hundred species. In Marattiaceae, monophyly of genera has been well-supported phylogenetically. However, the phylogenetic relationships among them were elusive and controversial. Here, a dataset of 26 transcriptomes (including 11 newly generated) were used to assess single-copy nuclear genes and to obtain the organelle gene sequences. Through phylotranscriptomic analysis, the phylogeny and hybridization events of Marattiaceae were explored and a robust phylogenomic framework for the evolution of Marattiaceae was provided. Using both concatenation- and coalescent-based phylogenies, the gene-tree discordance, incomplete lineage sorting (ILS) simulations, and network inference were examined. Except the low support with mitochondrial genes of Marattiaceae, nuclear genes and chloroplast genes strongly supported a sister relationship between Marattiaceae and leptosporangiate ferns. At the genus level, all phylogenetic analysis based on nuclear genes datasets recovered five genera in Marattiaceae as monophyletic with strong support. Danaea and Ptisana were the first two diverged clades in turn. Christensenia was a sister clade to the clade Marattia + Angiopteris s.l. In Angiopteris s.l., three clades (Angiopteris s.s., the Archangiopteris group, and An. sparsisora) were well identified with maximum support. The Archangiopteris group was derived from Angiopteris s.s. at ca. 18 Ma. The putative hybrid species An. sparsisora between Angiopteris s.s. and the Archangiopteris group was verified by the species network analyses and the maternal plastid genes. This study will improve our understanding for using the phylotranscriptomic method to explore phylogeny and investigate hybridization events for difficult taxa in ferns.


Phylogenetic Reconstruction Using Single-Copy Nuclear Genes
Based on the 20 single-copy nuclear gene datasets, a total of 40 species trees were inferred using concatenation and coalescent approaches in IQ-tree (Figures 2b, 3b,d, 4d, 5b, 6b, 7b, S1b,d, S2b, S3-S5b, S6b,d and S7-S12b) and Astral-II (Figures 2a, 3a,c, 4d, 5a, 6a, 7a, S1a,c, S2a, S3-S5a, S6a,c and S7-S12a), respectively. In analyses of nucleotide and amino acid sequences, Marattiaceae were found to be sister to Polypodiidae Ptisana is sister to the rest of Marattiaceae. Angiopteris s.l. is sister to Marattia, and they are together sister to Christensenia. In Angiopteris s.l., three well-supported clades were identified: Angiopteris s.s., the Archangiopteris group, and An. sparsisora (a potential hybrid; see below).   . Maximum likelihood bootstrap support (ML-BS) is shown above the branches, and quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches. Pie charts present the proportion of congruent gene trees that supports that clade (blue), the proportion of discordant gene trees of the main alternative topology for that clade (green), the proportion of discordant trees for the remaining alternative topologies (red), the proportion of uninformative gene trees (dark gray; bootstrap support <50%), and the proportion of missing data (light gray). (b,d). Maximum likelihood bootstrap support (ML-BS) is shown above the branches, and quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches. Pie charts present the proportion of congruent gene trees that supports that clade (blue), the proportion of discordant gene trees of the main alternative topology for that clade (green), the proportion of discordant trees for the remaining alternative topologies (red), the proportion of uninformative gene trees (dark gray; bootstrap support <50%), and the proportion of missing data (light gray). (b,d). Maximum likelihood bootstrap support (ML-BS) is shown above the branches, and quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches. Pie charts present the proportion of congruent gene trees that supports that clade (blue), the proportion of discordant gene trees of the main alternative topology for that clade (green), the proportion of discordant trees for the remaining alternative topologies (red), the proportion of uninformative gene trees (dark gray; bootstrap support <50%), and the proportion of missing data (light gray). (b,d). Maximum likelihood bootstrap support (ML-BS) is shown above the branches, and quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches.        (Figures 2a,b, S1a-d and S2a,b). However, the former (Danaea as the first evolutionary lineage) was with full QS support (1/−/1) (Figure 2b), the latter (the  (Figures 3a,b, 7a,b, S6a,b and S10-S12a,b). Angiopteris s.s. (except An. sparsisora) and the Archangiopteris group were recovered as monophyletic in all concatenation and coalescent species trees, with support by only 87/137 gene trees (out of 460/571; ICA = 0.16-1) (Figures 3a,b, 7a,b, S6a,b and S10-S12a,b) but with strong support (ML-BS = 100; AS-PP = 0.98-1) and full QS support (1/−/1) (Figures 3a,b, 7a,b, S6a,b and S10-S12a,b).

Phylogenetic Reconstruction of Organelle Genes
Using the customized pipeline, we recovered 87 and 32 protein-coding genes from the CT and MT datasets, respectively. Finally, alignments of CT dataset with 78,897 bp in length, MT dataset with 36,568 bp in length, and the combined matrix (CM) with 115,465 bp in length were used to infer the phylogeny. Detailed information on the assembled chloroplast genes and mitochondria genes are provided in Figshare: https://doi.org/10.6084/m9 .figshare.21707546.
At the family level, phylogenetic analysis of the CT and CM datasets recovered Marattiaceae sister to Polypodiidae with strong support (ML-BS = 98; Figures 4a,c, S13 and S15), but the MT dataset resolved Marattiaceae as sister to the clade Equisetaceae + Psilotaceae with weak support (ML-BS = 41; Figures 4b and S14). At the genus level, the plastid genes result corroborates the nuclear phylogeny ( Figure S13). The phylogeny based on mitochondrion genes (MT dataset) inferred that five genera are non-monophyletic but with extremely weak support ( Figure S14). The CM dataset infers that the clade Danaea + Ptisana is the first diverged clade with strong support (ML-BS = 90). Marattia is nested with Angiopteris s.l., and they are together sister to Christensenia ( Figure S15).
To avoid the impact of missing data of chloroplast genes, we only retain Marattia sp. as the outgroup to infer phylogeny of Angiopteris s.l. with chloroplast genes. Three mediumsupported clades (Angiopteris s.s. (ML-BS = 97), Angiopteris sparsisora (a putative hybrid species) clade, and the Archangiopteris group (ML-BS = 51)) were identified ( Figure S16). Angiopteris sparsisora was sister to Angiopteris s.s. based on chloroplast phylogenomics instead of sister to the Archangiopteris group in nuclear genes.

Coalescent Simulations
To investigate whether ILS alone could explain the cytonuclear and nuclear genetree discordance, coalescent simulation analysis with datasets of Family At the genus level and family level, distribution of tree-to-tree distances between simulated gene trees and the observed gene trees largely overlapped (Figures 5, 6c,d, S3-S5c,d and S7-S9c,d) using the Family-O-aa/Family-O-cds/Family-Saa/Family-S-cds/Genus-O-aa/Genus-O-cds/Genus-S-aa/Genus-S-cds datasets. However, using the same ILS simulation workflows within the remaining datasets at species level, ILS tests for most of the discordance in Angiopteris s.l. were accounted for. All the analyses distinguishing between simulated and observed trees distance distributions show that ILS cannot account for the discordance in Angiopteris s.l. (Figures 7c,d and S10-S12c,d).

Network Analysis
The datasets used in ILS simulations were used to perform phylogenetic network analysis in PhyloNet while accounting for both hybridization and ILS. At the family level (Figures 5e and S3-S5e), the inferred Network4 has the highest log pseudo-likelihood (Table 3). Gene flow was detected in all outgroups (Figures 5e and S3-S5e), indicating that hybridization could be a possible cause of discordance. Up to five hybridization events at the generic level in Marattiaceae were also examined; the inferred network with two reticulation events has the highest log pseudo-likelihood (Table 3)  In Angiopteris s.l., up to five hybridization events among Angiopteris s.s. were allowed (Figures 7e and S10-S12e). Angiopteris sparsisora was inferred to have a genetic contribution from the lineage of Angiopteris s.s. and the Archangiopteris group in all datasets, and its reticulation event was detected in all 20 examinations. Angiopteris sparsisora has an inheritance probability of 0.371-0.506 from the Angiopteris s.s. lineage and 0.494-0.629 from the Archangiopteris group lineage (Figures 7e and S10-S12e). Although the inferred network with the highest log pseudo-likelihood included one reticulation event (Table 3, except Species-S-cds dataset with two reticulations: −267.86003727524945), the generalevel hybridization events that included more reticulations, which were mainly caused by hybridization events among Angiopteris, were calculated (Figures 6e and S7-S9e).

Divergence Dating Estimation
Our analyses infer a stem age of Marattiaceae dating to the late Carboniferous-early Permian (ca. 319 Ma; Figure 8). The split between Danaea and the rest of extant Marattiaceae occurred in ca. 201 Ma, corresponding to the late Triassic. The stem age of Christensenia was estimated to be ca. 165 Ma (middle Jurassic), and the crown age was ca. 43 Ma (Figure 8). The stem age of Marattia was estimated to be ca. 46 Ma (middle Paleogene; Figure 8). In Angiopteris s.l., the Archangiopteris group diverged from Angiopteris s.s. at ca. 37 Ma during middle Cenozoic (Figure 8).

New Insights into Phylogeny of Marattiaceae
In previous phylogenetic reconstruction for Marattiaceae, less than four genera and

New Insights into Phylogeny of Marattiaceae
In previous phylogenetic reconstruction for Marattiaceae, less than four genera and four species were used for phylogenetic analysis based on nuclear genes [28,34,[37][38][39][40]. In this work, using a broad taxon sample of newly generated transcriptomes, the phylogenetic relationships of Marattiales were explored through analysis of uniparentally inherited organelle genes and single-copy biparental nuclear genes.
With extensive outgroups, two alternatives for the first divergent lineage, Danaea (Figure 2b) or the clade Danaea + Ptisana (Figures 2a, S1a-d and S2a,b), were found. The latter, however, has a higher proportion of gene tree conflicts and low ICA and QS values (Figures 2a,b, S1a-d and S2a,b). To further test the first lineage of Marattiaceae, conflict analysis results of different root settings were compared using single-copy nuclear genes specific to Marattiaceae, Danaea as the first evolutionary lineage of Marattiaceae with higher gene proportion (Figures 3a-d and S6a-d). In addition, phylogenetic reconstruction based on chloroplast genes also shows the root of Marattiaceae separates Danaea from the rest of the genus ( Figure S13). In all, our phylogenomic results are consistent with Murdock's [10] study based on chloroplast DNA data but with higher support for each clade (Figures 3a,b, 6a,b, S6a,b and S7-S9a,b). Although Eupodium was not included in our phylogenetic analysis, it has been resolved as the sister to Ptisana in all previous studies (e.g., [1,10,19,20,48]). Here, TreeShrink was further selected to reduce the influence of long branch attraction and improved the accuracy of phylogeny in all datasets, which could identify and remove those sequences that lead to unrealistically long branch lengths within each cluster [70]. Nearly all previous phylogenetic studies have resolved Danaea as the first divergent clade in Marattiaceae [1,10,19,20,23,48,56,63]. In this study, we also recovered Danaea as a sister to the rest of the family in phylogenetic analyses of the multi-locus plastid genes; all the conflict analyses based on different single-copy nuclear gene datasets and the all analyses were concordant (ICA = 1; QS = 1/−/1) (Figures 3a,b, 6a,b, S6a,b, S7-S9a,b and S13). Ptisana was sister to the clade comprised by three genera (Angiopteris s.l., Christensenia, and Marattia) in Marattiaceae. In some previous studies, phylogenetic placement of Danaea and Christensenia was tentative and they both showed long branches and possibly demonstrated convergent evolution in protein coding genes [1,10,63]. In our phylogenetic analysis, Christensenia was resolved as sister to the clade Marattia + Angiopteris s.l. with full support (ML-BS = 100; AS-PP = 1; QS = 1/−/1; ICA = 0.82-0.89), which was different from Lehtonen et al. [19]'s study that either recovered Christensenia as sister to Marattia (in plastid genome dataset) or as the second divergent clade sister to remaining genera except Danaea (in plastid genome and 77 phenotypic datasets). However, our inferred phylogeny based on multi-locus chloroplast genes also supports the same major clade position of Marattiaceae based on single-copy nuclear genes. In this case, two alternative explanations can be assumed: the phylogenetic reconstruction may be confounded by chloroplast RNA editing, and maternally inherited plastome representing only a single coalescent history may not reflect the species phylogenetic relationship [71][72][73][74][75]. The whole plastome is not "the holy grail" [76]; the effectiveness of plastome-scale data is ultimately reflected by the extent to which they reveal the "true" phylogenetic relationships of a given lineage [77]. Incomplete lineage sorting (ILS), introgressive hybridization, and horizontal gene trans-fer can also result in such high confliction [78]. Based on the phylogenetic analysis of transcriptome data and previous phylogenetic results, we determined that the most likely phylogenetic relationship of six genera in Marattiaceae is (Danaea, ((Ptisana + Eupodium), (Christensenia, (Marattia + Angiopteris s.l.)))).

Resolution of Angiopteris s.l.
In this study, a total of 10 samples representing 10 species of Angiopteris s.l. were included. Three well-supported clades (Angiopteris s.s., Angiopteris sparsisora, and the Archangiopteris group) were identified based on single-copy nuclear genes datasets and chloroplast genes (Figures 3a-d, 7a,b, S6a-d, S10-S12a,b and S16). Because of the similar morphology and poor phylogenetic resolution, Angiopteris s.s., Archangiopteris, Macroglossum, Protomarattia Hayata, and Protangiopteris Hayata were usually combined into Angiopteris s.l. [1,2,10,42,55,79]. In previous studies, Angiopteris s.s. often was not resolved as a monophyletic group [1,10,23,56,63]. Macroglossum has only one or two species with limited geographical distribution in Borneo and Sumatra [80]. Archangiopteris seems to be a monophyletic group [1] with about 10 species distributed in China and Vietnam [51][52][53]. We sampled four species and our phylogeny confirmed that the Archangiopteris group was a well-supported monophyletic clade in both nuclear gene datasets and chloroplast genes (Figures 3a-d, 7a,b, S6a-d, S10-S12a,b and S16). Angiopteris s.s. is paraphyletic, including two clades: the hybrid An. sparsisora and Angiopteris s.s., in all the nuclear gene datasets. Angiopteris sparsisora has been recognized as a putative hybrid between Angiopteris s.s. and the Archangiopteris group [42,57]. Ching and Wang [57] thought that An. sparsisora is more similar to Angiopteris s.s. morphologically and placed it in Angiopteris, a position that is also supported by our chloroplast genes ( Figure S16). However, our phylogenetic analysis showed that An. sparsisora has a closer relationship with the Archangiopteris group instead of Angiopteris s.s. in all nuclear gene datasets. Morphologically, the Archangiopteris group is different from Angiopteris s.s. in having creeping rhizomes and often once-pinnate blades, the stipes with one or several (1-7) naked pulvini (presenting in whole growth phases of plants) [81].
Archangiopteris (=the Archangiopteris group) was established by Christ and Giesenhagen [62] based on Ar. henryi (=An. latipinna). The prefix "arch-" of Archangiopteris means primitive, representing Archangiopteris with more primitive morphological characters (or earlier differentiation) than Angiopteris s.s., which was also demonstrated in a chloroplast genes dataset with one outgroup ( Figure S16). Such ideas were supported by some previous morphological studies and phylogenetic analysis [23,47,51,56,63]. However, our phylogenetic analysis recovered that Angiopteris s.s. was diverged earlier than the Archangiopteris group with full support value in all nuclear gene datasets (Figures 3a-d, 7a,b, 8, S6a-d and S10-S12a,b). An estimated divergence time based on nuclear genes showed that the Archangiopteris group evolved ca. 27 Ma, nearly 10 Ma later than Angiopteris. It was also noted by some authors [51,64] who thought that the Archangiopteris group probably was derived from Angiopteris s.s. based on the taxonomy, geographical distribution, morphological, and anatomical features. In order to understand the phylogenetic relationship of Angiopteris s.l. well, potentially, some taxonomic ranks (nothosubgenus, subgenus, or section) in Angiopteris s.l. could be recognized and proposed, and hybridization events in Marattiaceae need to be further investigated in the future.

Phylogenetic Incongruence as Further Evidence for ILS and Hybridization
With different materials and methods, a large number of previous phylogenetic studies have yielded many incongruent topologies in Marattiaceae (Figure 1) [1,10,19,20,48,56,63]. In addition, our analyses showed that there was a very high level of heterogeneity among single-copy nuclear gene trees, and cytonuclear discordance with different phylogenetic relationships of Equisetaceae, Marattiaceae, Ophioglossiodeae, Polypodiidae, and Psilotaceae. While these inconsistencies have never been well explained, it is often believed that phylogenetic incongruence in plants is mainly due to hybridization and ILS [82][83][84][85][86].
Here, the phylogenetic relationship of Marattiaceae was inferred using coalescent methods, which also allow us to assess the ancient hybridization, introgression, and incomplete lineage sorting (ILS) [87]. The confliction within gene trees and among concatenation and coalescent trees at all topologies was detected in this study (Figures 2a,b, 3a-d, 5a,b, 6a,b, 7a,b, S1a-d, S2a,b, S3-S5a,b, S6a-d and S7-S12a,b). At family level and genus level, nearly overlapping RF distances were detected between the empirical trees and the coalescent simulations, indicating that ILS alone could explain most of the gene tree discordance (Figures 5, 6c,d, S3-S5c,d and S7-S9c,d). Meanwhile, ILS is likely to apply regarding mainly reasons for family-level and genus-level conflict that diverged during rapid speciation events and/or had large population sizes [88]. We also detected interfamily and inter-genus geneflow among almost all lineages (Figures 3e, S2, S3 and S5e). High hybridization frequency had been reported in ferns [89][90][91][92][93], and it is generally believed that hybridization and allopolyploidization are the main driving forces in the evolution of ferns [94]. Topology conflict in Marattiaceae at family level and genus level was potentially caused by ILS and hybridization together.
In Angiopteris s.l., our simulation analysis showed a large difference in both frequency and tree-to-tree distance between the coalescent simulations and the empirical trees (Figures 7c,d and S10-S12c,d), and the ILS, therefore, cannot explain the unresolved phylogenetic relationship between Angiopteris s.s. and the Archangiopteris group. Potentially, gene flow is frequent between Angiopteris s.s. and the Archangiopteris group, leading to topological conflict. Hybridization analysis further suggested that An. sparsisora was of hybrid origin and was detected in all 20 examinations, with an inheritance probability of 0.371-0.506 from the Angiopteris s.s. lineage and 0.494-0.629 from the Archangiopteris group lineage (Figures 7e and S10-S12e). Using the command CalGTProb, the greatest likelihood of the network was computed. It indicated that only one hybridization event was selected, which conflicts with the previous hypothesis of intergeneric hybridization event times (Table 3). Angiopteris sparsisora is endemic to a valley under evergreen broad-leaved forests in Xichou County, Fadou Town, which is located in southeastern Yunnan Province, China. Ching and Wang [57] have assumed that this distinct species is a natural hybrid between Angiopteris s.s. and the Archangiopteris group, with a number of intermediate morphological characteristics [42,57]. Ching and Wang [57] placed An. sparsisora in Angiopteris s.s. rather than in the Archangiopteris group because of the fact that, in overall impression, the new taxon appears more similar to the former than the latter. China and northern Vietnam was thought to be the original center, with high diversity of the Archangiopteris group. In this area, a large number of species of Angiopteris s.s. and the Archangiopteris group occur, growing side by side in great abundance. Another presumptive hybrid species, An. bipinnata (=Ar. bipinnata), was sampled in our study. We confirmed that An. bipinnata is a member of the Archangiopteris group. However, our analysis did not detect obvious gene flow from other lineages (Figures 7e and S10-S12e). In Angiopteris s.s., some other hybrid species have also been reported. (e.g., An. sugongii, An. itoi (=Ar. itoi)), and it is necessary to investigate them with more materials and molecular data.

Taxon Sampling and RNA Sequencing
In this study, 26 samples representing 24 species (Table 1) were sampled, including almost all major lineages recognized in Marattiaceae plus outgroups. Eleven samples were newly generated in this study and 15 samples were obtained from GenBank SRA database (Table 1). In total, except Eupodium (a neotropical genus with three or four species [19,95]), all recognized genera of Marattiaceae in PPG I [2] were sampled. New transcriptome data were sequenced on an Illumina HiSeq2000 platform. Total RNA extraction, library preparation, raw data cleaning, and quality control were performed in 2018 at Majorbio, Shanghai, China.

Assembly and Single-Copy Orthologue Identification
Illumina adapter and poor-quality bases were removed from the raw reads of the transcriptomes using Fastp v0.12.4 [96] with default parameters. The cleaned data were used for de novo assembly using Trinity v2.8.5 [97] and only the longest transcripts were selected for subsequent analysis using Trinity's own perl script to remove over-represented sequences. Minimap2 v2.17-r941 [98] and Samtools v1.11 [99] were chosen to filter the chloroplast and mitochondrial transcripts using the closest publicly available reference chloroplast genome and mitochondrion genome (Table S1) as references. CD-HIT-EST v4.8.1 [100] was used to further remove redundant contigs with a threshold of 0.99. TransDecoder v5.5.0 [101] was used to identify candidate coding regions. To cluster transcripts into orthologous genes, the OrthoFinder v2.3.8 [102,103] and SonicParanoid v1.3.8 [104] software were used to infer core-orthogroups based on all-against-all searches with default parameters and only single-copy orthologues present in all samples were selected for subsequent analyses. This resulted in 92 single-copy nuclear genes that shared with all 26 samples in total by using OrthoFinder and 187 single-copy nuclear genes that shared with all 26 samples in total by using SonicParanoid, respectively.

Nuclear Genes Dataset Generation and Phylogenetic Inference
Considering the later hybridization analysis with computational restrictions and by using different datasets to reconstruct the relationship of Marattiaceae, we first constructed 20 single-copy nuclear gene datasets composed of different species (Table S2) and discussed the phylogenetic relationship at the family level, genus level, species level, and the root of Marattiaceae. Each of the acquired amino acid sequences was aligned first then trimmed by using plugins Mafft v7.450 (set E-INS-i) [105,106] and trimAI v1.3 (set -automated1) [107] in PhyloSuite v1.2.2 [108], respectively. Given that recombination within loci might bias the inference of the species tree [109], we further used PhiPack v1.1 [110] to test the pairwise homoplasy index Φ for recombination with default sliding window size of 100 bp, and alignments that showed a strong signal of recombination with p ≤ 0.05 were removed from all subsequent phylogenetic analyses. Subsequently, for the concatenation approach in which all genes in a concatenated matrix are considered a single locus to infer phylogeny, sequences of aligned genes in each dataset were concatenated using the PhyloSuite plugins "Concatenate Sequence". The concatenation ML tree inferred in IQ-tree v2.1.2 [111] with 10,000 ultrafast bootstraps [112] and performed model selection using the bias-corrected Akaike information criterion (AICc) in ModelFinder [113]. In addition, a coalescent approach was applied to estimate the species tree. Each gene tree was generated via IQ-tree with 1000 ultrafast bootstraps and the coalescent tree was inferred by Astral-II v5.7.8 [114]. In addition, TreeShrink v 1.3.7 [89] was selected to reduce the long branch attraction (LBA) influence. The retained sequences were used to construct the concatenation and coalescent trees with the same method mentioned above. Finally, the corresponding codon alignments were converted from amino acid sequences using PAL2NAL v14.0 [115] to generate datasets, remove recombination loci, reduce LBA, and for phylogeny reconstruction.

Organelle Genome Assembly, Locus Extraction, and Phylogenetic Analysis
Chloroplast and mitochondrion genes were obtained from assembly transcripts mapped to the closest available reference chloroplast and mitochondrion genome (Table S1) in Geneious Prime [116] with the parameter settings in the "Geneious RNA" program as highest sensitivity. Annotation was subsequently conducted in Geneious [set 90% similarity]. Then, we filled in the gaps with 'N' for all of the protein-coding genes and extracted. Each of the acquired protein-coding genes was aligned first then trimmed by using plugins Mafft (set E-INS-i) and trimAI (set -automated1) in PhyloSuite, respectively. For the concatenated matrix, we constructed chloroplast genes dataset (CT), mitochondrial genes dataset (MT), and chloroplast genes and mitochondrial genes combined dataset (CM) and then used the 5000 ultrafast bootstrap replicates for branch support in IQ-tree.

Concordance, ILS Simulation, and Hybridization Inference
We used two approaches to explore discordance among nuclear gene trees. First, we mapped the individual gene trees onto the coalescent tree and calculated the internode certainty all (ICA [117]) value to quantify the degree of conflict on each node and the number of conflicting and concordant bipartition on each node of the coalescent trees using the gene trees with maximum likelihood bootstrap supports (MLBS) values at least 50% for the corresponding node. Both ICA and conflicting/concordant bipartitions were calculated with Phyparts [118]. Then, we used quartet sampling (QS [119]) to distinguish strong conflict from weakly supported branches using the concatenation tree with 1000 replicates. For the ILS analysis, we used Dendropy v4.5.2 [120] to simulate gene trees using the species tree as a guide tree and generated a total of 20,000 gene trees. Then, we calculated the Robinson-Foulds (RF) distance between the species tree and each simulated or observed gene tree by using the workflow of Wang et al. [82]. To test the hypothesis of hybridization, we used PhyloNet v3.8.0 [121] to infer an evolutionary network with the command "In-ferNetworks_ML" [122] from individual gene trees. Due to computational restrictions, a maximum of five reticulation events was set and run with 50 runs to ensure accuracy. To estimate the optimal number of hybridizations, we computed the likelihood scores of the given individual gene trees using the command "CalGTProb" [123]. The networks were visualized in Dendroscope v3.8.1 [124].

Age Estimation
Divergence time estimations were performed based on the concatenated 16-O-cds dataset using penalized likelihood in treePL v2.6.3 [125]. We were mainly concerned with the chronological order of divergence of the two lineages of Angiopteris s.s. and the Archangiopteris group. We used two calibration points based on the estimation of Lehtonen et al. [19], who applied a parsimony-based total-evidence dating approach, which suggested a Triassic age for the extant crown group, which is consistent with the fossil records that have been found so far [12,49,126,127]: (1) an age estimate ranging from 307 Ma to 320 Ma for the Marattiales, and (2) the split between Danaea and the most recent common ancestor of the rest extant species of Marattiaceae was estimated in the Late Triassic (201-236 Ma). Following the guidelines of Maurin [128], cross-validation analysis was conducted with rate-smoothing values from 1010 to 10-30 and a "cvmultstep" of 0.1 and resulted in an optimal smoothing parameter of 0.00001. Lastly, used TreeAnnotater v2.4.7 to summarize all dated 1000 bootstrap replicates into a maximum clade credibility (MCC) tree with 0% of burn-in and mean node heights. The MCC tree was displayed in FigTree v1.4.3 [129].

Conclusions
Using different phylogenetic analyses, the phylogeny of Marattiaceae was reconstructed with a dataset from 26 transcriptomes. In general, our results are consistent with previous phylogenetic results [1,10]. The most likely phylogenetic relationship of six genera in Marattiaceae is (Danaea, ((Ptisana + Eupodium), (Christensenia, (Marattia + Angiopteris s.l.)))). Moreover, some new insights have been revealed for hybridization in Angiopteris s.l. The Archangiopteris group (=Archangiopteris) forms a well-supported clade that is nested within Angiopteris s.s. The putative hybrid species Angiopteris sparsisora between Angiopteris s.l. and the Archangiopteris group was confirmed by the network analysis and maternal chloroplast genes. ILS and gene flow could be the main reason for the puzzled systematists at family level, genus level, and species level in Marattiaceae.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/plants12122237/s1. Table S1: The references of complete organelle genome used in this study. Table S2: Sample composition of different datasets. Table S3: Information of transcriptomes included in this study. Figure S1: Phylogenomic analysis of concatenationbased trees from the 26-O-aa and 26-O-cds datasets in Marattiaceae (a,b. 26-O-aa datasets, c,d. 26-O-cds datasets); a,c. astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) values are shown above branches, and numbers of gene trees (concordant/conflicting) were shown below branches. Pie charts present the proportion of congruent gene trees that supports that clade (blue), the proportion of discordant gene trees of the main alternative topology for that clade (green), the proportion of discordant trees for the remaining alternative topologies (red), the proportion of uninformative gene trees (dark gray; bootstrap support <50%), and the proportion of missing data (light gray); b,d. maximum likelihood bootstrap support (ML-BS) is shown above the branches, and quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches. Figure S2: Phylogenetic analysis of species trees based on 26-S-cds in Marattiaceae. a. Astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) values are shown above branches, and numbers of gene trees (concordant/conflicting) were shown below branches. Pie charts present the proportion of congruent gene trees that supports that clade (blue), the proportion of discordant gene trees of the main alternative topology for that clade (green), the proportion of discordant trees for the remaining alternative topologies (red), the proportion of uninformative gene trees (dark gray; bootstrap support <50%), and the proportion of missing data (light gray). b. Maximum likelihood bootstrap support (ML-BS) is shown above the branches, and quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches. Figure S3: a,b. Phylogenetic analysis of species tree with Family-O-cds dataset in Marattiaceae (a. Astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) scores are shown above branches and numbers of gene trees (concordant/conflicting) are shown next to the nodes; b. maximum likelihood bootstrap support (ML-BS) is shown above the branches. Quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches); c. distributions of topology frequencies of observed and simulated gene trees based on Family-O-cds dataset; d. distribution of Robinson-Foulds distances of the simulated and observed gene trees to the coalescent tree; e. species network inferred from PhyloNet pseudolikelihood analyses with one to five hybridization events based on Family-O-cds dataset. Red and blue curved branches indicate the minor and major edges of hybrid nodes, respectively. Numbers next to curved branches indicate inheritance probabilities for each hybrid node. Figure S4: a,b. Phylogenetic analysis of species tree with Family-S-aa dataset in Marattiaceae (a. Astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) scores are shown above branches and numbers of gene trees (concordant/conflicting) are shown next to the nodes; b. maximum likelihood bootstrap support (ML-BS) is shown above the branches. Quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches); c. distributions of topology frequencies of observed and simulated gene trees based on Family-S-aa dataset; d. distribution of Robinson-Foulds distances of the simulated and observed gene trees to the coalescent tree; e. species network inferred from PhyloNet pseudolikelihood analyses with one to five hybridization events based on Family-S-aa dataset. Red and blue curved branches indicate the minor and major edges of hybrid nodes, respectively. Numbers next to curved branches indicate inheritance probabilities for each hybrid node. Figure S5: a,b. Phylogenetic analysis of species tree with Family-S-cds dataset in Marattiaceae (a. Astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) scores are shown above branches and numbers of gene trees (concordant/conflicting) are shown next to the nodes; b. maximum likelihood bootstrap support (ML-BS) is shown above the branches. Quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches); c. distributions of topology frequencies of observed and simulated gene trees based on Family-S-cds dataset; d. distribution of Robinson-Foulds distances of the simulated and observed gene trees to the coalescent tree; e. species network inferred from PhyloNet pseudolikelihood analyses with one to five hybridization events based on Family-S-cds dataset. Red and blue curved branches indicate the minor and major edges of hybrid nodes, respectively. Numbers next to curved branches indicate inheritance probabilities for each hybrid node. Figure S6: Phylogenomic analysis of species trees based on 16-O-aa and 16-S-aa datasets in Marattiaceae (a,b. 16-O-aa datasets; c,d. 16-S-aa datasets). a,c. Astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) values are shown above branches, and numbers of gene trees (concordant/conflicting) were shown below branches. Pie charts present the proportion of congruent gene trees that supports that clade (blue), the proportion of discordant gene trees of the main alternative topology for that clade (green), the proportion of discordant trees for the remaining alternative topologies (red), the proportion of uninformative gene trees (dark gray; bootstrap support <50%), and the proportion of missing data (light gray); b,d. maximum likelihood bootstrap support (ML-BS) is shown above the branches, and quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches. Figure S7: a,b. Phylogenetic analysis of species tree with Genus-O-aa dataset in Marattiaceae (a. Astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) scores are shown above branches and numbers of gene trees (concordant/conflicting) are shown next to the nodes; b. maximum likelihood bootstrap support (ML-BS) is shown above the branches. Quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches); c. distributions of topology frequencies of observed and simulated gene trees based on Genus-O-aa dataset; d. distribution of Robinson-Foulds distances of the simulated and observed gene trees to the coalescent tree; e. species network inferred from PhyloNet pseudolikelihood analyses with one to five hybridization events based on Genus-O-aa dataset. Red and blue curved branches indicate the minor and major edges of hybrid nodes, respectively. Numbers next to curved branches indicate inheritance probabilities for each hybrid node. Figure S8: a,b. Phylogenetic analysis of species tree with Genus-S-aa dataset in Marattiaceae (a. Astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) scores are shown above branches and numbers of gene trees (concordant/conflicting) are shown next to the nodes; b. maximum likelihood bootstrap support (ML-BS) is shown above the branches. Quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches); c. distributions of topology frequencies of observed and simulated gene trees based on Genus-S-aa dataset; d. distribution of Robinson-Foulds distances of the simulated and observed gene trees to the coalescent tree; e. species network inferred from Phy-loNet pseudolikelihood analyses with one to five hybridization events based on Genus-S-aa dataset. Red and blue curved branches indicate the minor and major edges of hybrid nodes, respectively. Numbers next to curved branches indicate inheritance probabilities for each hybrid node. Figure  S9: a,b. Phylogenetic analysis of species tree with Genus-S-cds dataset in Marattiaceae (a. Astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) scores are shown above branches and numbers of gene trees (concordant/conflicting) are shown next to the nodes; b. maximum likelihood bootstrap support (ML-BS) is shown above the branches. Quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches); c. distributions of topology frequencies of observed and simulated gene trees based on Genus-S-cds dataset; d. distribution of Robinson-Foulds distances of the simulated and observed gene trees to the coalescent tree; e. species network inferred from PhyloNet pseudolikelihood analyses with one to five hybridization events based on Genus-S-cds dataset. Red and blue curved branches indicate the minor and major edges of hybrid nodes, respectively. Numbers next to curved branches indicate inheritance probabilities for each hybrid node. Figure S10: a,b. Phylogenetic analysis of species tree with Species-O-cds dataset in Marattiaceae (a. Astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) scores are shown above branches and numbers of gene trees (concordant/conflicting) are shown next to the nodes; b. maximum likelihood bootstrap support (ML-BS) is shown above the branches. Quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches); c. distributions of topology frequencies of observed and simulated gene trees based on Species-O-cds dataset; d. distribution of Robinson-Foulds distances of the simulated and observed gene trees to the coalescent tree; e. species network inferred from Phy-loNet pseudolikelihood analyses with one to five hybridization events based on Species-O-cds dataset. Red and blue curved branches indicate the minor and major edges of hybrid nodes, respectively. Numbers next to curved branches indicate inheritance probabilities for each hybrid node. Figure  S11: a,b. Phylogenetic analysis of species tree with Species-S-aa dataset in Marattiaceae (a. Astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) scores are shown above branches and numbers of gene trees (concordant/conflicting) are shown next to the nodes; b. maximum likelihood bootstrap support (ML-BS) is shown above the branches. Quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches); c. distributions of topology frequencies of observed and simulated gene trees based on Species-S-aa dataset; d. distribution of Robinson-Foulds distances of the simulated and observed gene trees to the coalescent tree; e. species network inferred from PhyloNet pseudolikelihood analyses with one to five hybridization events based on Species-S-aa dataset. Red and blue curved branches indicate the minor and major edges of hybrid nodes, respectively. Numbers next to curved branches indicate inheritance probabilities for each hybrid node. Figure S12: a,b. Phylogenetic analysis of species tree with Species-S-cds dataset in Marattiaceae (a. Astral-II posterior probabilities (AS-PP) and internode certainty all (ICA) scores are shown above branches and numbers of gene trees (concordant/conflicting) are shown next to the nodes; b. maximum likelihood bootstrap support (ML-BS) is shown above the branches. Quartet sampling internal node scores (quartet concordance/quartet differential/quartet informativeness) are shown below branches); c. distributions of topology frequencies of observed and simulated gene trees based on Species-S-cds dataset; d. distribution of Robinson-Foulds distances of the simulated and observed gene trees to the coalescent tree; e. species network inferred from PhyloNet pseudolikelihood analyses with one to five hybridization events based on Species-S-cds dataset. Red and blue curved branches indicate the minor and major edges of hybrid nodes, respectively. Numbers next to curved branches indicate inheritance probabilities for each hybrid node.